Goto

Collaborating Authors

 concrete dropout


Concrete Dropout

Neural Information Processing Systems

Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary--a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout's discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.



Concrete Dropout

Neural Information Processing Systems

Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary-- a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout's discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.


Cosmic Microwave Background Recovery: A Graph-Based Bayesian Convolutional Network Approach

arXiv.org Artificial Intelligence

The cosmic microwave background (CMB) is a significant source of knowledge about the origin and evolution of our universe. However, observations of the CMB are contaminated by foreground emissions, obscuring the CMB signal and reducing its efficacy in constraining cosmological parameters. We employ deep learning as a data-driven approach to CMB cleaning from multi-frequency full-sky maps. In particular, we develop a graph-based Bayesian convolutional neural network based on the U-Net architecture that predicts cleaned CMB with pixel-wise uncertainty estimates. We demonstrate the potential of this technique on realistic simulated data based on the Planck mission. We show that our model accurately recovers the cleaned CMB sky map and resulting angular power spectrum while identifying regions of uncertainty. Finally, we discuss the current challenges and the path forward for deploying our model for CMB recovery on real observations.


Learning Uncertainty with Artificial Neural Networks for Improved Remaining Time Prediction of Business Processes

arXiv.org Artificial Intelligence

Artificial neural networks will always make a prediction, even when completely uncertain and regardless of the consequences. This obliviousness of uncertainty is a major obstacle towards their adoption in practice. Techniques exist, however, to estimate the two major types of uncertainty: model uncertainty and observation noise in the data. Bayesian neural networks are theoretically well-founded models that can learn the model uncertainty of their predictions. Minor modifications to these models and their loss functions allow learning the observation noise for individual samples as well. This paper is the first to apply these techniques to predictive process monitoring. We found that they contribute towards more accurate predictions and work quickly. However, their main benefit resides with the uncertainty estimates themselves that allow the separation of higher-quality from lower-quality predictions and the building of confidence intervals. This leads to many interesting applications, enables an earlier adoption of prediction systems with smaller datasets and fosters a better cooperation with humans.


Concrete Dropout

Neural Information Processing Systems

Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary--a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout's discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles.


Deep Contextual Multi-armed Bandits

arXiv.org Machine Learning

Contextual multi-armed bandit problems arise frequently in important industrial applications. Existing solutions model the context either linearly, which enables uncertainty driven (principled) exploration, or non-linearly, by using epsilon-greedy exploration policies. Here we present a deep learning framework for contextual multi-armed bandits that is both non-linear and enables principled exploration at the same time. We tackle the exploration vs. exploitation trade-off through Thompson sampling by exploiting the connection between inference time dropout and sampling from the posterior over the weights of a Bayesian neural network. In order to adjust the level of exploration automatically as more data is made available to the model, the dropout rate is learned rather than considered a hyperparameter. We demonstrate that our approach substantially reduces regret on two tasks (the UCI Mushroom task and the Casino Parity task) when compared to 1) non-contextual bandits, 2) epsilon-greedy deep contextual bandits, and 3) fixed dropout rate deep contextual bandits. Our approach is currently being applied to marketing optimization problems at HubSpot.


Concrete Dropout

Neural Information Processing Systems

Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary-- a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout's discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.


[R] [1705.07832] Concrete Dropout -- learnable dropout probabilities!! • r/MachineLearning

@machinelearnbot

The original one, Variational Dropout and the Local Reparameterization Trick is cited in the Concrete Dropout paper and is indeed somewhat limited, however this issue is resolved in Variational Dropout Sparsifies Deep Neural Networks (accepted to ICML '17, paper from my labmates). They have very strange excuse to avoid comparison with the last paper (IMO both methods use different relaxations, it'd be useful to compare them face-to-face) We chose not to compare to Gaussian dropout in our experiments, as when optimising Gaussian dropout's α following its variational interpretation [23], the method is known to underperform [28] UPD: there's also Generalized Dropout (uses straight through estimator, which is not unbiased gradient estimator, and Information Dropout that does not use binary formulation.


Concrete Dropout

arXiv.org Machine Learning

Dropout is used as a practical tool to obtain uncertainty estimates in large vision models and reinforcement learning (RL) tasks. But to obtain well-calibrated uncertainty estimates, a grid-search over the dropout probabilities is necessary-- a prohibitive operation with large models, and an impossible one with RL. We propose a new dropout variant which gives improved performance and better calibrated uncertainties. Relying on recent developments in Bayesian deep learning, we use a continuous relaxation of dropout's discrete masks. Together with a principled optimisation objective, this allows for automatic tuning of the dropout probability in large models, and as a result faster experimentation cycles. In RL this allows the agent to adapt its uncertainty dynamically as more data is observed. We analyse the proposed variant extensively on a range of tasks, and give insights into common practice in the field where larger dropout probabilities are often used in deeper model layers.